Extracting Information from Indian First Names
نویسنده
چکیده
First name of a person can tell important demographic and cultural information about that person. This paper proposes statistical models for extracting vital information that is gender, religion and name validity from Indian first names. Statistical models combine some classical features like ngrams and Levenshtein distance along with some self observed features like vowel score and religion belief. Rigorous evaluation of models has been performed through several machine learning algorithms to compare the accuracy, FMeasure, Kappa Static and RMS error. Experimental results give promising and favorable results which indicate that these models proposed can be directly used in other information extraction systems.
منابع مشابه
A General Approach to Extracting Full Names and Abbreviations for Chinese Entities from the Web
Identifying Full names/abbreviations for entities is a challenging problem in many applications, e.g. question answering and information retrieval. In this paper, we propose a general extraction method of extracting full names/abbreviations from Chinese Web corpora. For a given entity, we construct forward and backward query items and commit them to a search engine (e.g. Google), and utilize se...
متن کاملONER: Tool for Organization Named Entity Recognition from Affiliation Strings in PubMed Abstracts
Automatically extracting organization names from the affiliation sentences of articles related to biomedicine is of great interest to the pharmaceutical marketing industry, health care funding agencies and public health officials. It will also be useful for other scientists in normalizing author names, automatically creating citations, indexing articles and identifying potential resources or co...
متن کاملStandardized American Indians: The "Names of Indian tribes and bands" list from the Office of Indian Affairs
The inconsistent spelling of American Indian tribal names at the end of the nineteenth century led in part to the development within the Office of Indian Affairs of an array of 270 standardized identifiers, ranging from Absaroka to Zuñi. These efforts paralleled the simultaneous improvement of a large suite of relevant terms by the United States Board on Geographic Names. Both compilations were...
متن کاملESM-IL: Entity Extraction from Social Media Text for Indian Languages @ FIRE 2015 - An Overview
Entity recognition is a very important sub task of Information extraction and find its applications in information retrieval, machine translation and other higher Natural Language Processing (NLP) applications such as co-reference resolution. Entities are real world elements or objects such as Person names, Organization names, Product names, Location names. Entities are often referred to as Nam...
متن کاملEffect of a thermal power plant waste fly ash on leguminous and non-leguminous leafy vegetables in extracting maximum benefits from P and K fertilization
Although the Indian population is largely vegetarian, not much attention has been given to the cultivation of vegetables, as compared to other crops like cereals, pulses and oil seeds. Therefore, the present study was conducted on two leafy vegetables, spinach (Spanacia oleracea L.) and methi (Trigonella foenum graecum L.) commonly grown in Aligarh, as the two popular vegetables of Indian diet....
متن کامل